Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read dataset #3

Open
wants to merge 85 commits into
base: main
Choose a base branch
from
Open

Read dataset #3

wants to merge 85 commits into from

Conversation

crdanielbusch
Copy link
Contributor

@crdanielbusch crdanielbusch commented Nov 18, 2024

Description

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

Notes

  • Added an integration test that runs download and read script
  • Unit tests did not seem helpful

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "poetry run python3 scripts/read_all_domains.py",
 "dsid": "934d913e-8268-4342-aa0c-3702ee516d10",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "poetry run python3 scripts/read_all_domains.py",
 "dsid": "934d913e-8268-4342-aa0c-3702ee516d10",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@crdanielbusch crdanielbusch marked this pull request as ready for review November 20, 2024 14:40
@crdanielbusch crdanielbusch self-assigned this Nov 21, 2024
Copy link

@JGuetschow JGuetschow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most looks good, but the make targets don't work for me because of locking issues. And some more smaller comments.

Makefile Outdated
@@ -82,3 +82,8 @@ virtual-environment: ## update virtual environment, create a new one if it does
download_all_domains:
# downloads and stages (datalad save) all available data
datalad run poetry run python3 scripts/download_all_domains.py

.PHONY: read_latest_data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this fail is you run it when not connected to the internet? I usually don't link read to download so data is not downloaded again if I have to read it again because of a processing code change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, realized it doesn't fail, because make recognizes that the files are there. But it will not check for a new version.

This make target says "nothing to be done" after changing the data reading code. This should be changed, so we can re-read data after making changes to the code.

@@ -0,0 +1,8 @@
"""Read the latest release of all available domains."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More general comment: I think it would e good to have an option to re-read older data as well,e.g. if we read additional data or variables.

from pathlib import Path


def get_root_path(root_indicator: str = ".git") -> Path:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to use an indicator file. This avoids setting an environment variable as in the UNFCCC_non-annexI_data repo.

Makefile Outdated
@@ -82,3 +82,8 @@ virtual-environment: ## update virtual environment, create a new one if it does
download_all_domains:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me download_all_domains tries to read the data into primap2 format. This is not what I would intuitively expect it to do (also it fails)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails because the target files are locked.

dimensions:
'*':
- time
- category (FAOSTAT)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Category is to be changed, after the category tree is done?

crdanielbusch and others added 30 commits December 10, 2024 12:51
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "python3 scripts/download_all_domains.py",
 "dsid": "934d913e-8268-4342-aa0c-3702ee516d10",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [
  "downloaded_data"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "python3 scripts/download_all_domains.py",
 "dsid": "934d913e-8268-4342-aa0c-3702ee516d10",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [
  "downloaded_data"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "python3 scripts/download_all_domains.py",
 "dsid": "934d913e-8268-4342-aa0c-3702ee516d10",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [
  "downloaded_data"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
# Conflicts:
#	src/faostat_data_primap/download.py
…aset

# Conflicts:
#	poetry.lock
#	pyproject.toml
#	requirements.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants